chore: fix Docker cqlsh install and improve simulation diagnostics#7691
chore: fix Docker cqlsh install and improve simulation diagnostics#7691zawadzkidiana wants to merge 1 commit intocadence-workflow:masterfrom
Conversation
- Install py3-setuptools, py3-wheel, and ca-certificates in Docker images so pip3 install cqlsh succeeds on newer Alpine/pip - Add --no-build-isolation to cqlsh pip install - Return structured JSON from simulation worker /health endpoint with cluster info, ready domain count, and last errors - Add per-endpoint error/status logging in simulation health check with 1s HTTP client timeout to prevent hangs Signed-off-by: Diana Zawadzki <dzawa@live.de>
🔍 CI failure analysis for 06b0385: The codecov test failure in TestPollForDecisionTasks is a flaky test unrelated to this PR's Docker and simulation changes.IssueTest Root CauseThis failure is unrelated to the PR changes. This PR modifies:
The failing test is in the matching engine service testing poller behavior, which has no connection to Docker builds or simulation health checks. DetailsEvidence of Flakiness
The test expects a poller to wait 0ms but it waited 3ms, suggesting a timing race in the test setup or teardown. Code Review ✅ ApprovedClean infrastructure PR: Docker build fixes for Alpine 3.18 pip compatibility and improved simulation health check diagnostics with proper concurrency handling. No issues found. Rules ❌ No requirements metRepository Rules
OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
fixed with: #7690 |
What changed?
Fix Docker
cadence-auto-setupimage build and improve replication simulation test diagnostics.py3-setuptools,py3-wheel, andca-certificatesin Docker images sopip3 install cqlshsucceeds on newer Alpine/pip--no-build-isolationto cqlsh pip install/healthendpoint with cluster info, ready domain count, and last errorsWhy?
These changes were extracted from the replication histogram metric PRs (#7683, #7684) where they were originally included to unblock CI. They are prerequisites for those PRs to pass CI, but are logically independent infrastructure fixes.
The Docker
cadence-auto-setupimage build was broken on Alpine 3.18 with newer pip versions.pip3 install cqlshfails becausesetuptoolsandwheelare no longer bundled by default — they must be explicitly installed as system packages. The--no-build-isolationflag is also needed because cqlsh's build dependencies cannot be resolved in an isolated environment without pre-installed setuptools. Theca-certificatesaddition in the dockerize stage ensures HTTPS downloads (e.g. the dockerize release tarball via wget) succeed reliably. Without these fixes, all CI jobs that build or use the Docker image — including the replication simulation tests that validate the histogram additions — fail before any test code runs.The simulation worker
/healthendpoint previously returned empty 200/503 responses with no body, making it impossible to diagnose readiness failures in CI logs. When a health check failed, the test logged only "Workers are not reporting healthy yet" with no indication of whether the failure was DNS resolution, connection refused, HTTP 503, or a specific domain not being ready. The structured JSON response and per-endpoint logging make CI failures immediately actionable.